Method of preparing a modified granulocyte colony stimulating factor (G-CSF) with reduced immunogenicity

ABSTRACT

A method of preparing a modified granulocyte colony stimulating factor (G-CSF) protein having reduced immunogenicity relative to human G-CSF comprises the steps of (i) identifying one or more potential T-cell epitopes within the amino acid sequence of human G-CSF (SEQ ID NO:  1 ); (ii) designing at least one sequence variant of at least one potential T-cell epitope identified in step (i), wherein the sequence variant eliminates or substantially reduces the MHC class II binding activity of the potential T-cell epitope; (iii) preparing, by recombinant DNA techniques, at least one modified G-CSF protein including a sequence variant designed in step (ii); (iv) evaluating at least one modified G-CSF protein prepared in step (iii) for G-CSF activity and immunogenicity; and (v) selecting a modified G-CSF protein evaluated in step (iv) that has substantially the same therapeutic G-CSF biological activity as, but substantially less immunogenicity than, human G-CSF.

This application is the National Stage of International Application No.PCT/EP02/01171, filed on Feb. 5, 2002, which claims priority fromEuropean Patent Application No. 01103954.2, filed on Feb. 19, 2001 andEuropean Patent Application No. 01102617.6, filed on Feb. 6, 2001.

FIELD OF THE INVENTION

The present invention relates to polypeptides to be administeredespecially to humans and in particular for therapeutic use. Thepolypeptides are modified polypeptides whereby the modification resultsin a reduced propensity for the polypeptide to elicit an immune responseupon administration to the human subject. The invention in particularrelates to the modification of human granulocyte colony stimulatingfactor (G-CSF) to result in G-CSF protein variants that aresubstantially non-immunogenic or less immunogenic than any non-modifiedcounterpart when used in vivo. The invention relates furthermore toT-cell epitope peptides derived from said non-modified protein by meansof which it is possible to create modified granulocyte colonystimulating factor variants with reduced immunogenicity.

BACKGROUND OF THE INVENTION

There are many instances whereby the efficacy of a therapeutic proteinis limited by an unwanted immune reaction to the therapeutic protein.Several mouse monoclonal antibodies have shown promise as therapies in anumber of human disease settings but in certain cases have failed due tothe induction of significant degrees of a human anti-murine antibody(HAMA) response [Schroff, R. W. et al (1985) Cancer Res. 45: 879-885;Shawler, D. L. et al (1985) J. Immunol. 135: 1530-1535]. For monoclonalantibodies, a number of techniques have been developed in attempt toreduce the HAMA response [WO 89/09622; EP 0239400; EP 0438310; WO91/06667]. These recombinant DNA approaches have generally reduced themouse genetic information in the final antibody construct whilstincreasing the human genetic information in the final construct.Notwithstanding, the resultant “humanized” antibodies have, in severalcases, still elicited an immune response in patients [Issacs J. D.(1990) Sem. Immunol. 2: 449, 456; Rebello, P. R. et al (1999)Transplantation 68: 1417-1420].

Antibodies are not the only class of polypeptide molecule administeredas a therapeutic agent against which an immune response may be mounted.Even proteins of human origin and with the same amino acid sequences asoccur within humans can still induce an immune response in humans.Notable examples include the therapeutic use of granulocyte-macrophagecolony stimulating factor [Wadhwa, M. et al (1999) Clin. Cancer Res. 5:1353-1361] and interferon alpha 2 [Russo, D. et al (1996) Bri. J. Haem.94: 300-305; Stein, R. et al (1988) New Engl. J. Med. 318: 1409-1413]amongst others.

A principal factor in the induction of an immune response is thepresence within the protein of peptides that can stimulate the activityof T-cell via presentation on MHC class II molecules, so-called “T-cellepitopes. Such potential T-cell epitopes are commonly defined as anyamino acid residue sequence with the ability to bind to MHC Class IImolecules. Such T-cell epitopes can be measured to establish MHCbinding. Implicitly, a “T-cell epitope” means an epitope which whenbound to MHC molecules can be recognized by a T-cell receptor (TCR), andwhich can, at least in principle, cause the activation of these T-cellsby engaging a TCR to promote a T-cell response. It is, however, usuallyunderstood that certain peptides which are found to bind to MHC Class IImolecules may be retained in a protein sequence because such peptidesare recognized as “self” within the organism into which the finalprotein is administered.

It is known, that certain of these T-cell epitope peptides can bereleased during the degradation of peptides, polypeptides or proteinswithin cells and subsequently be presented by molecules of the majorhistocompatability complex (MHC) in order to trigger the activation ofT-cells. For peptides presented by MHC Class II, such activation ofT-cells can then give rise, for example, to an antibody response bydirect stimulation of B-cells to produce such antibodies.

MHC Class II molecules are a group of highly polymorphic proteins whichplay a central role in helper T-cell selection and activation. The humanleukocyte antigen group DR (HLA-DR) are the predominant isotype of thisgroup of proteins and are the major focus of the present invention.However, isotypes HLA-DQ and HLA-DP perform similar functions, hence theresent invention is equally applicable to these. The MHC class II DRmolecule is made of an alpha and a beta chain which insert at theirC-termini through the cell membrane. Each hetero-dimer possesses aligand binding domain which binds to peptides varying between 9 and 20amino acids in length, although the binding groove can accommodate amaximum of 11 amino acids. The ligand binding domain is comprised ofamino acids 1 to 85 of the alpha chain, and amino acids 1 to 94 of thebeta chain. DQ molecules have recently been shown to have an homologousstructure and the DP family proteins are also expected to be verysimilar. In humans approximately 70 different allotypes of the DRisotype are known, for DQ there are 30 different allotypes and for DP 47different allotypes are known. Each individual bears two to four DRalleles, two DQ and two DP alleles. The structure of a number of DRmolecules has been solved and such structures point to an open-endedpeptide binding groove with a number of hydrophobic pockets which engagehydrophobic residues (pocket residues) of the peptide [Brown et alNature (1993) 364: 33; Stern et al (1994) Nature 368: 215]. Polymorphismidentifying the different allotypes of class II molecule contributes toa wide diversity of different binding surfaces for peptides within thepeptide binding grove and at the population level ensures maximalflexibility with regard to the ability to recognize foreign proteins andmount an immune response to pathogenic organisms. There is aconsiderable amount of polymorphism within the ligand binding domainwith distinct “families” within different geographical populations andethnic groups. This polymorphism affects the binding characteristics ofthe peptide binding domain, thus different “families” of DR moleculeswill have specificities for peptides with different sequence properties,although there may be some overlap. This specificity determinesrecognition of Th-cell epitopes (Class II T-cell response) which areultimately responsible for driving the antibody response to β-cellepitopes present on the same protein from which the Th-cell epitope isderived. Thus, the immune response to a protein in an individual isheavily influenced by T-cell epitope recognition which is a function ofthe peptide binding specificity of that individual's HLA-DR allotype.Therefore, in order to identify T-cell epitopes within a protein orpeptide in the context of a global population, it is desirable toconsider the binding properties of as diverse a set of HLA-DR allotypesas possible, thus covering as high a percentage of the world populationas possible.

An immune response to a therapeutic protein such as the protein which isobject of this invention, proceeds via the MHC class II peptidepresentation pathway. Here exogenous proteins are engulfed and processedfor presentation in association with MHC class II molecules of the DR,DQ or DP type. MHC Class II molecules are expressed by professionalantigen presenting cells (APCs), such as macrophages and dendritic cellsamongst others. Engagement of a MHC class II peptide complex by acognate T-cell receptor on the surface of the T-cell, together with thecross-binding of certain other co-receptor receptors such as the CD4molecule, can induce an activated state within the T-cell. Activationleads to the release of cytokines further activating other lymphocytessuch as B cells to produce antibodies or activating T killer cells as afull cellular immune response.

The ability of a peptide to bind a given MHC class II molecule forpresentation on the surface of an APC is dependent on a number offactors most notably its primary sequence. This will influence both itspropensity for proteolytic cleavage and also its affinity for bindingwithin the peptide binding cleft of the MHC class II molecule. The MHCclass II/peptide complex on the APC surface presents a binding face to aparticular T-cell receptor (TCR) able to recognize determinants providedboth by exposed residues of the peptide and the MHC class II molecule.

In the art there are procedures for identifying synthetic peptides ableto bind MHC class II molecules (e.g. WO98/52976 and WO00/34317). Suchpeptides may not function as T-cell epitopes in all situations,particularly, in vivo due to the processing pathways or other phenomena.T-cell epitope identification is the first step to epitope elimination.The identification and removal of potential T-cell epitopes fromproteins has been previously disclosed. In the art methods have beenprovided to enable the detection of T-cell epitopes usually bycomputational means scanning for recognized sequence motifs inexperimentally determined T-cell epitopes or alternatively usingcomputational techniques to predict MHC class II-binding peptides and inparticular DR-binding peptides.

WO98/52976 and WO00/34317 teach computational threading approaches toidentifying polypeptide sequences with the potential to bind a sub-setof human MHC class II DR allotypes. In these teachings, predicted T-cellepitopes are removed by the use of judicious amino acid substitutionwithin the primary sequence of the therapeutic antibody or non-antibodyprotein of both non-human and human derivation.

Other techniques exploiting soluble complexes of recombinant MHCmolecules in combination with synthetic peptides and able to bind toT-cell clones from peripheral blood samples from human or experimentalanimal subjects have been used in the art [Kern, F. et al (1998) NatureMedicine 4:975-978; Kwok, W. W. et al (2001) TRENDS in Immunology 22:583-588] and may also be exploited in an epitope identificationstrategy.

As depicted above and as consequence thereof, it would be desirable toidentify and to remove or at least to reduce T-cell epitopes from agiven in principal therapeutically valuable but originally immunogenicpeptide, polypeptide or protein.

G-CSF is an important haemopoietic cytokine currently used in treatmentof indications where an increase in blood neutrophils will providebenefits. These include cancer therapy, various infectious diseases andrelated conditions such as sepsis. G-CSF is also used alone, or incombination with other compounds and cytokines in the ex vivo expansionof haemopoeitic cells for bone marrow transplantation.

Two forms of human G-CSF are commonly recognized for this cytokine. Oneis a protein of 177 amino acids, the other a protein of 174 amino acids[Nagata et al. (1986), EMBO J. 5: 575-581], the 174 amino acid form hasbeen found to have the greatest specific in vivo biological activity.Recombinant DNA techniques have enabled the production of commercialscale quantities of G-CSF exploiting both eukaryotic and prokaryotichost cell expression systems.

The amino acid sequence of human granulocyte colony stimulating factor(G-CSF) (depicted as one-letter code) is as follows:

(SEQ ID NO:1) TPLGPASSLPQSFLLKCLEQVRKIQGDGAALQEKLCATYKLCHPEELVLLGHSLGIPWAPLSSCPSQALQLAGCLSQLHSGLFLYQGLLQALEGISPELGPTLDTLQLDVADFATTIWQQMEELGMAPALQPTQGAMPAFASAFQRRAGGVLVASHLQSFLEVSYRVLRHLAQP.

Other polypeptide analogues and peptide fragments of G-CSF have beenpreviously disclosed, including forms modified by site-specific aminoacid substitutions and or by modification by chemical adducts. Thus U.S.Pat. No. 4,810,643 discloses analogues with the particular Cys residuesreplaced with another amino acid, and G-CSF with an Ala residue in thefirst (N-terminal) position. EP 0 335 423 discloses the modification ofat least one amino group in a polypeptide having G-CSF activity. EP 0272 703 discloses G-CSF derivatives having amino acid substituted ordeleted near the N terminus. EP 0 459 630 discloses G-CSF derivatives inwhich Cys 17 and Asp 27 are replaced by Ser residues. EP 0 243 153discloses G-CSF modified by inactivating at least one yeast KEX2protease processing site for increased yield in recombinant productionand U.S. Pat. No. 4,904,584 discloses lysine altered proteins. WO90/12874 discloses further Cys altered variants and Australian patentdocument AU-A-10948/92 discloses the addition of amino acids to eitherterminus of a G-CSF molecule for the purpose of aiding in the folding ofthe molecule after prokaryotic expression. AU-76380/91, discloses G-CSFvariants at positions 50-56 of the G-CSF 174 amino acid form, andpositions 53-59 of the 177 amino acid form. Additional changes atparticular His residues were also disclosed.

It is understood that many of the above approaches have been directedtowards improvements in the commercial production of G-CSF, for exampleimproved in vitro stability. None of these teachings recognize theimportance of T-cell epitopes to the immunogenic properties of theprotein nor have been conceived to directly influence said properties ina specific and controlled way according to the scheme of the presentinvention.

However, there is a continued need for granulocyte colony stimulatingfactor (G-CSF) analogues with enhanced properties. Desired enhancementsinclude alternative schemes and modalities for the expression andpurification of the said therapeutic, but also and especially,improvements in the biological properties of the protein. There is aparticular need for enhancement of the in vivo characteristics whenadministered to the human subject. In this regard, it is highly desiredto provide granulocyte colony stimulating factor (G-CSF) with reduced orabsent potential to induce an immune response in the human subject.

SUMMARY AND DESCRIPTION OF THE INVENTION

The present invention provides for modified forms of “granulocyte colonystimulating factor (G-CSF)”, in which the immune characteristic ismodified by means of reduced or removed numbers of potential T-cellepitopes. The present invention provides for modified forms of humanG-CSF with one or more T-cell epitopes removed. The invention disclosessequences identified within the G-CSF primary sequence that arepotential T-cell epitopes by virtue of MHC class II binding potential.This disclosure specifically pertains to both recognized forms of thehuman G-CSF protein being the 177 amino acid species and the 174 aminoacid species.

The invention may be applied to any G-CSF species of molecule withsubstantially the same primary amino acid sequences as those disclosedherein and would include therefore G-CSF molecules derived by geneticengineering means or other processes and may not contain either 177 or174 amino acid residues.

G-CSF proteins such as identified from murine, bovine, canine and othermammalian sources have in common many of the peptide sequences of thepresent disclosure and have in common many peptide sequences withsubstantially the same sequence as those of the disclosed listing. Suchprotein sequences equally therefore fall under the scope of the presentinvention.

The invention discloses also specific positions within the primarysequence of the molecule according to the invention which has to bealtered by specific amino acid substitution, addition or deletionwithout affecting the biological activity in principal. In cases inwhich the loss of immunogenicity can be achieved only by a simultaneousloss of biological activity it is possible to restore said activity byfurther alterations within the amino acid sequence of the protein.

The invention discloses furthermore methods to produce such modifiedmolecules, above all methods to identify said T-cell epitopes which haveto be altered in order to reduce or remove immunogenetic sites.

The protein according to this invention would expect to display anincreased circulation time within the human subject and would be ofparticular benefit in chronic or recurring disease settings such as isthe case for a number of indications for granulocyte colony stimulatingfactor (G-CSF). The present invention provides for modified forms ofG-CSF proteins that are expected to display enhanced properties in vivo.These modified G-CSF molecules can be used in pharmaceuticalcompositions.

In summary the invention relates to the following issues:

-   -   a modified molecule having the biological activity of human        granulocyte colony stimulating factor (G-CSF) and being        substantially non-immunogenic or less immunogenic than any        non-modified molecule having the same biological activity when        used in vivo;    -   an accordingly specified molecule, wherein said loss of        immunogenicity is achieved by removing one or more T-cell        epitopes derived from the originally non-modified molecule;    -   an accordingly specified molecule, wherein said loss of        immunogenicity is achieved by reduction in numbers of MHC        allotypes able to bind peptides derived from said molecule;    -   an accordingly specified molecule, wherein one T-cell epitope is        removed;    -   an accordingly specified molecule, wherein said originally        present T-cell epitopes are MHC class II ligands or peptide        sequences which show the ability to stimulate or bind T-cells        via presentation on class II;    -   an accordingly specified molecule, wherein said peptide        sequences are selected from the group as depicted in Table 1;    -   an accordingly specified molecule, wherein 1-9 amino acid        residues, preferably one amino acid residue in any of the        originally present T-cell epitopes are altered;    -   an accordingly specified molecule, wherein the alteration of the        amino acid residues is substitution, addition or deletion of        originally present amino acid(s) residue(s) by other amino acid        residue(s) at specific position(s);    -   an accordingly specified molecule, wherein one or more of the        amino acid residue substitutions are carried out as indicated in        Table 2;    -   an accordingly specified molecule, wherein (additionally) one or        more of the amino acid residue substitutions are carried out as        indicated in Table 3 for the reduction in the number of MHC        allotypes able to bind peptides derived from said molecule;    -   an accordingly specified molecule, wherein, if necessary,        additionally further alteration usually by substitution,        addition or deletion of specific amino acid(s) is conducted to        restore biological activity of said molecule;    -   A DNA sequence or molecule which codes for any of the modified        molecules as specified above and below;    -   a pharmaceutical composition comprising a modified molecule        having the biological activity of granulocyte colony stimulating        factor (G-CSF) as defined above and/or in the claims, optionally        together with a pharmaceutically acceptable carrier, diluent or        excipient;    -   a method for manufacturing a modified molecule having the        biological activity of granulocyte colony stimulating factor        (G-CSF) as defined in any of the claims of the above-cited        claims comprising the following steps: (i) determining the amino        acid sequence of the polypeptide or part thereof; (ii)        identifying one or more potential T-cell epitopes within the        amino acid sequence of the protein by any method including        determination of the binding of the peptides to MHC molecules        using in vitro or in silico techniques or biological        assays; (iii) designing new sequence variants with one or more        amino acids within the identified potential T-cell epitopes        modified in such a way to substantially reduce or eliminate the        activity of the T-cell epitope as determined by the binding of        the peptides to MHC molecules using in vitro or in silico        techniques or biological assays; (iv) constructing such sequence        variants by recombinant DNA techniques and testing said variants        in order to identify one or more variants with desirable        properties; and (v) optionally repeating steps (ii)-(iv);    -   an accordingly specified method, wherein step (iii) is carried        out by substitution, addition or deletion of 1-9 amino acid        residues in any of the originally present T-cell epitopes;    -   an accordingly specified method, wherein the alteration is made        with reference to a homologues protein sequence and/or in silico        modeling techniques;    -   an accordingly specified method, wherein step (ii) of above is        carried out by the following steps: (a) selecting a region of        the peptide having a known amino acid residue sequence; (b)        sequentially sampling overlapping amino acid residue segments of        predetermined uniform size and constituted by at least three        amino acid residues from the selected region; (c) calculating        MHC Class II molecule binding score for each said sampled        segment by summing assigned values for each hydrophobic amino        acid residue side chain present in said sampled amino acid        residue segment; and (d) identifying at least one of said        segments suitable for modification, based on the calculated MHC        Class II molecule binding score for that segment, to change        overall MHC Class II binding score for the peptide without        substantially reducing therapeutic utility of the peptide;        step (c) is preferably carried out by using a Böhm scoring        function modified to include 12-6 van der Waal's ligand-protein        energy repulsive term and ligand conformational energy term        by (1) providing a first data base of MHC Class II molecule        models; (2) providing a second data base of allowed peptide        backbones for said MHC Class II molecule models; (3) selecting a        model from said first data base; (4) selecting an allowed        peptide backbone from said second data base; (5) identifying        amino acid residue side chains present in each sampled        segment; (6) determining the binding affinity value for all side        chains present in each sampled segment; and repeating steps (1)        through (5) for each said model and each said backbone;    -   a 13 mer T-cell epitope peptide having a potential MHC class II        binding activity and created from immunogenetically non-modified        granulocyte colony stimulating factor (G-CSF), selected from the        group as depicted in Table 1 and its use for the manufacture of        G-CSF having substantially no or less immunogenicity than any        non-modified molecule with the same biological activity when        used in vivo;    -   a peptide sequence consisting of at least 9 consecutive amino        acid residues of a 13 mer T-cell epitope peptide as specified        above and its use for the manufacture of G-CSF having        substantially no or less immunogenicity than any non-modified        molecule with the same biological activity when used in vivo;

The term “T-cell epitope” means according to the understanding of thisinvention an amino acid sequence which is able to bind MCH II, able tostimulate T-cells and/or also to bind (without necessarily measurablyactivating) T-cells in complex with MHC II. The term “peptide” as usedherein and in the appended claims, is a compound that includes two ormore amino acids. The amino acids are linked together by a peptide bond(defined herein below). There are 20 different naturally occurring aminoacids involved int eh biological production of peptides, and any numberof them may be linked in any order to form a peptide chain or ring. Thenaturally occurring amino acids employed in the biological production ofpeptides all have the L-configuration. Synthetic peptides can beprepared employing conventional synthetic methods, utilizing L-aminoacids, D-amino acids, or various combinations of amino acids of the twodifferent configurations. Some peptides contain only a few amino acidunits. Short peptides, e.g., having less than ten amino acid units, aresometimes referred to as “oligopeptides”. Other peptides contain a largenumber of amino acid residues, e.g. up to 100 ore more, and are referredto as “polypeptides”. By convention, a “polypeptide” may be consideredas any peptide chain containing three or more amino acids, whereas a“oligopeptide” is usually considered as a particular type of “short”polypeptide. Thus, as used herein, it is understood that any referenceto a “polypeptide” also includes an oligopeptide. Further, any referenceto a “peptide” includes polypeptides, oligopeptides, and proteins. Eachdifferent arrangement of amino acids forms different polypeptides orproteins. The number of polypeptides—and hence the number of differentproteins—that can be formed is practically unlimited. “Alpha carbon(Cα)” is the carbon atom of the carbon-hydrogen (CH) component that isin the peptide chain. A “side chain” is a pendant group to Cα that cancomprise a simple or complex group or moiety, having physical dimensionsthat can vary significantly compared to the dimensions of the peptide.

The invention may be applied to any G-CSF species of molecule withsubstantially the same primary amino acid sequences as those disclosedherein and would include therefore G-CSF molecules derived by geneticengineering means or other processes and may not contain either 177 or174 amino acid residues. granulocyte colony stimulating factor (G-CSF)proteins such as identified from other mammalian sources have in commonmany of the peptide sequences of the present disclosure and have incommon many peptide sequences with substantially the same sequence asthose of the disclosed listing. Such protein sequences equally thereforefall under the scope of the present invention.

The invention is conceived to overcome the practical reality thatsoluble proteins introduced into autologous organisms can trigger animmune response resulting in development of host antibodies that bind tothe soluble protein. One example amongst others, is interferon alpha 2to which a proportion of human patients make antibodies despite the factthat this protein is produced endogenously [Russo, D. et al (1996) ibid;Stein, R. et al (1988) ibid]. It is likely that the same situationpertains to the therapeutic use of granulocyte colony stimulating factor(G-CSF) and the present invention seeks to address this by providinggranulocyte colony stimulating factor (G-CSF) proteins with alteredpropensity to elicit an immune response on administration to the humanhost.

The general method of the present invention leading to the modifiedgranulocyte colony stimulating factor (G-CSF) comprises the followingsteps:

-   (a) determining the amino acid sequence of the polypeptide or part    thereof;-   (b) identifying one or more potential T-cell epitopes within the    amino acid sequence of the protein by any method including    determination of the binding of the peptides to MHC molecules using    in vitro or in silico techniques or biological assays;-   (c) designing new sequence variants with one or more amino acids    within the identified potential T-cell epitopes modified in such a    way to substantially reduce or eliminate the activity of the T-cell    epitope as determined by the binding of the peptides to MHC    molecules using in vitro or in silico techniques or biological    assays. Such sequence variants are created in such a way to avoid    creation of new potential T-cell epitopes by the sequence variations    unless such new potential T-cell epitopes are, in turn, modified in    such a way to substantially reduce or eliminate the activity of the    T-cell epitope; and-   (d) constructing such sequence variants by recombinant DNA    techniques and testing said variants in order to identify one or    more variants with desirable properties according to well known    recombinant techniques.

The identification of potential T-cell epitopes according to step (b)can be carried out according to methods describes previously in theprior art. Suitable methods are disclosed in WO 98/59244; WO 98/52976;WO 00/34317 and may preferably be used to identify binding propensity ofgranulocyte colony stimulating factor (G-CSF)-derived peptides to an MHCclass II molecule.

Another very efficacious method for identifying T-cell epitopes bycalculation is described in the EXAMPLE which is a preferred embodimentaccording to this invention.

In practice a number of variant granulocyte colony stimulating factor(G-CSF) proteins will be produced and tested for the desired immune andfunctional characteristic. The variant proteins will most preferably beproduced by recombinant DNA techniques although other proceduresincluding chemical synthesis of granulocyte colony stimulating factor(G-CSF) fragments may be contemplated.

The results of an analysis according to step (b) of the above scheme andpertaining to the whole human G-CSF protein sequences of both the 174and 177 forms is presented in Table 1.

TABLE 1 Peptide sequences in human granulocyte colony stimulating factor(G-CSF) with potential human MHC class II binding activity.TPLGPASSLPQSF (SEQ ID NO: 2), SSLPQSFLLKCLE (SEQ ID NO: 3),QSFLLKCLEQVRK (SEQ ID NO: 4), SFLLKCLEQVRKI (SEQ ID NO: 5),FLLKCLEQVRKIQ (SEQ ID NO: 6), KCLEQVRKIQGDG (SEQ ID NO: 7),EQVRKIQGDGAAL (SEQ ID NO: 8), RKIQGDGAALQEK (SEQ ID NO: 9),AALQEKLVSECAT (SEQ ID NO: 10), EKLVSECATYKLC (SEQ ID NO: 11),KLVSECATYKLCH (SEQ ID NO: 12), AALQEKLCATYKL (SEQ ID NO: 13),EKLCATYKLCHPE (SEQ ID NO: 14), ATYKLCHPEELVL (SEQ ID NO: 15),YKLCHPEELVLLG (SEQ ID NO: 16), EELVLLGHSLGIP (SEQ ID NO: 17),ELVLLGHSLGIPW (SEQ ID NO: 18), HSLGIPWAPLSSC (SEQ ID NO: 19),IPWAPLSSCPSQA (SEQ ID NO: 20), APLSSCPSQALQL (SEQ ID NO: 21),QALQLAGCLSQLH (SEQ ID NO: 22), GCLSQLHSGLFLY (SEQ ID NO: 23),SQLHSGLFLYQGL (SEQ ID NO: 24), SGLFLYQGLLQAL (SEQ ID NO: 25),GLFLYQGLLQALE (SEQ ID NO: 26), LFLYQGLLQALEG (SEQ ID NO: 27),FLYQGLLQALEGI (SEQ ID NO: 28), QGLLQALEGISPE (SEQ ID NO: 29),GLLQALEGISPEL (SEQ ID NO: 30), QALEGISPELGPT (SEQ ID NO: 31),EGISPELGPTLDT (SEQ ID NO: 32), PTLDTLQLDVADF (SEQ ID NO: 33),DTLQLDVADFATT (SEQ ID NO: 34), LQLDVADFATTIW (SEQ ID NO: 35),LDVADFATTIWQQ (SEQ ID NO: 36), TTIWQQMEELGMA (SEQ ID NO: 37),TIWQQMEELGMAP (SEQ ID NO: 38), QQMEELGMAPALQ (SEQ ID NO: 39),EELGMAPALQPTQ (SEQ ID NO: 40), LGMAPALQPTQGA (SEQ ID NO: 41),PALQPTQGAMPAF (SEQ ID NO: 42), GAMPAFASAFQRR (SEQ ID NO: 43),PAFASAFQRRAGG (SEQ ID NO: 44), SAFQRRAGGVLVA (SEQ ID NO: 45),GGVLVASHLQSFL (SEQ ID NO: 46), GVLVASHLQSFLE (SEQ ID NO: 47),VLVASHLQSFLEV (SEQ ID NO: 48), SHLQSFLEVSYRV (SEQ ID NO: 49),QSFLEVSYRVLRH (SEQ ID NO: 50), SFLEVSYRVLRHL (SEQ ID NO: 51),LEVSYRVLRHLAQ (SEQ ID NO: 52),

Peptide sequences include peptides identified in both 177 and 174 aminoacid forms of G-CSF. Peptides are 13 mers, amino acids are identifiedusing single letter code. The results of a design and constructsaccording to step (c) and (d) of the above scheme and pertaining to themodified molecule of this invention is presented in Tables 2 and 3.

TABLE 2 Substitutions leading to the elimination of potential T-cellepitopes of human granulocyte colony stimulating factor (G-CSF) (WT =wild type). Residue # WT Residue Substitution 3 L A C D E G H K N P O RS T 9 L A C D E G H K N P Q R S T 14 L A C D E G H K N P Q R S T 15 L AC D E G H K N P Q R S T 18 L A C D E G H K N P Q R S T 21 V A C D E G HK N P Q R S T 24 I A C D E G H K N P Q R S T 31 L A C D E G H K N P Q RS T 35 L A C D E G H K N P Q R S T 39 Y A C D E G H K N P Q R S T 41 L AC D E G H K N P Q R S T 47 L A C D E G H K N P Q R S T 48 V A C D E G HK N P Q R S T 49 L A C D E G H K N P Q R S T 50 L A C D E G H K N P Q RS T 54 L A C D E G H K N P Q R S T 56 I A C D E G H K N P Q R S T 58 W AC D E G H K N P Q R S T 61 L A C D E G H K N P Q R S T 69 L A C D E G HK N P Q R S T 71 L A C D E G H K N P Q R S T 75 L A C D E G H K N P Q RS T 78 L A C D E G H K N P Q R S T 82 L A C D E G H K N P Q R S T 83 F AC D E G H K N P Q R S T 84 L A C D E G H K N P Q R S T 85 Y A C D E G HK N P Q R S T 88 L A C D E G H K N P Q R S T 89 L A C D E G H K N P Q RS T 92 L A C D E G H K N P Q R S T 95 I A C D E G H K N P Q R S T 99 L AC D E G H K N P Q R S T 103 L A C D E G H K N P Q R S T 106 L A C D E GH K N P Q R S T 108 L A C D E G H K N P Q R S T 110 V A C D E G H K N PQ R S T 113 F A C D E G H K N P Q R S T 117 I A C D E G H K N P Q R S T118 W A C D E G H K N P Q R S T 121 M A C D E G H K N P Q R S T 124 L AC D E G H K N P Q R S T 130 L A C D E G H K N P Q R S T 137 M A C D E GH K N P Q R S T 140 F A C D E G H K N P Q R S T 144 F A C D E G H K N PQ R S T 151 V A C D E G H K N P Q R S T 152 L A C D E G H K N P Q R S T153 V A C D E G H K N P Q R S T 157 L A C D E G H K N P Q R S T 160 F AC D E G H K N P Q R S T 161 L A C D E G H K N P Q R S T 163 V A C D E GH K N P Q R S T

TABLE 3 Additional substitutions leading to the removal of a potentialT-cell epitope for 1 or more MHC allotypes. WT Residue # ResidueSubstitution 6 A H P O R S T 8 L H L 14 L F I M V W Y 16 K A C F G I L MP V W Y 17 C D E H K N P Q R S T 18 L F I M V W Y 19 E A C G L M V W Y21 V M W Y 22 R H P Q S T 24 I W Y 29 A D E G H K N P Q R T 30 A H K P QR S T 31 L F I M V W Y 32 Q A C G I L M P V W Y 33 E A C G H P 34 K H PT W 35 L I M W Y 39 Y M W 40 K A C F G I L M P V W Y 41 L M W Y 42 C H PT 43 H A G P 45 E P T 47 L F I M V W Y 48 V M W Y 49 L F I W 50 L F M 51G P T W 53 S P Y 54 L F I M V W Y 61 L F I M W Y 62 S A C F G I L M P VW Y 63 S A C G P 64 C D E H I K N P Q R S T W Y 66 S D H P 67 Q P T 69 LW 71 L F I M V W Y 74 L H P 75 L F I M V W Y 76 S A C G P 77 Q A C F G LP M P V W Y 79 H A C G P W Y 80 S F H P W Y 81 G E H P Q R S T 82 L F IM V W Y 86 Q A C G P 87 G F I L P W 90 Q F I P T W Y 92 L W Y 98 E P T99 L I M V W Y 100 G H P T 103 L M W Y 106 L F I 108 L F I M V W 110 V MW 112 D A C G P 113 F M W 117 I M W 119 Q A C F G I L M P V W Y 120 Q IP T 122 E P 123 E H P 124 L F I M W Y 127 A P 129 A D E H N P Q R S T W130 L F I M V W Y 131 Q A C G P 134 Q A C G P 135 G D H N P Q R S T 136A P 142 S A C P 149 G D H P 150 G P 151 V I M W Y 152 L I M V W Y 153 VF M W Y 154 A P 155 S P 156 H P 159 S F H I P T V W Y 161 L F I M V W Y162 E A C G P 163 V F I M W Y 164 S A C G M P T 166 R F P T Y 167 V P168 L F H I K M N P Q R S T V W Y 169 R P T 170 H A C F G I L M P V W Y171 I A C D E F G H I K L M N P Q 171ctd R S T V W Y

The invention relates to granulocyte colony stimulating factor (G-CSF)analogues in which substitutions of at least one amino acid residue havebeen made at positions resulting in a substantial reduction in activityof or elimination of one or more potential T-cell epitopes from theprotein. One or more amino acid substitutions at particular pointswithin any of the potential MHC class II ligands identified in Table 1may result in a granulocyte colony stimulating factor (G-CSF) moleculewith a reduced immunogenic potential when administered as a therapeuticto the human host. Preferably, amino acid substitutions are made atappropriate points within the peptide sequence predicted to achievesubstantial reduction or elimination of the activity of the T-cellepitope. In practice an appropriate point will preferably equate to anamino acid residue binding within one of the hydrophobic pocketsprovided within the MHC class II binding groove.

It is most preferred to alter binding within the first pocket of thecleft at the so-called P1 or P1 anchor position of the peptide. Thequality of binding interaction between the P1 anchor residue of thepeptide and the first pocket of the MHC class II binding groove isrecognized as being a major determinant of overall binding affinity forthe whole peptide. An appropriate substitution at this position of thepeptide will be for a residue less readily accommodated within thepocket, for example, substitution to a more hydrophilic residue. Aminoacid residues in the peptide at positions equating to binding withinother pocket regions within the MHC binding cleft are also consideredand fall under the scope of the present.

It is understood that single amino acid substitutions within a givenpotential T-cell epitope are the most preferred route by which theepitope may be eliminated. Combinations of substitution within a singleepitope may be contemplated and for example can be particularlyappropriate where individually defined epitopes are in overlap with eachother. Moreover, amino acid substitutions either singly within a givenepitope or in combination within a single epitope may be made atpositions not equating to the “pocket residues” with respect to the MHCclass II binding groove, but at any point within the peptide sequence.Substitutions may be made with reference to an homologues structure orstructural method produced using in silico techniques known in the artand may be based on known structural features of the molecule accordingto this invention. All such substitutions fall within the scope of thepresent invention.

Amino acid substitutions other than within the peptides identified abovemay be contemplated particularly when made in combination withsubstitution(s) made within a listed peptide. For example a change maybe contemplated to restore structure or biological activity of thevariant molecule. Such compensatory changes and changes to includedeletion or addition of particular amino acid residues from thegranulocyte colony stimulating factor (G-CSF) polypeptide resulting in avariant with desired activity and in combination with changes in any ofthe disclosed peptides fall under the scope of the present.

In as far as this invention relates to modified granulocyte colonystimulating factor (G-CSF), compositions containing such modified G-CSFproteins or fragments of modified G-CSF proteins and relatedcompositions should be considered within the scope of the invention. Inanother aspect, the present invention relates to nucleic acids encodingmodified granulocyte colony stimulating factor (G-CSF) entities. In afurther aspect the present invention relates to methods for therapeutictreatment of humans using the modified G-CSF proteins.

EXAMPLE

There are a number of factors that play important roles in determiningthe total structure of a protein or polypeptide. First, the peptidebond, i.e., that bond which joins the amino acids in the chain together,is a covalent bond. This bond is planar in structure, essentially asubstituted amide. An “amide” is any of a group of organic compoundscontaining the grouping —CONH—.

The planar peptide bond linking Cα of adjacent amino acids may berepresented as depicted below:

Because the O═C and the C—N atoms lie in a relatively rigid plane, freerotation does not occur about these axes. Hence, a plane schematicallydepicted by the interrupted line is sometimes referred to as an “amide”or “peptide plane” plane wherein lie the oxygen (O), carbon (C),nitrogen (N), and hydrogen (H) atoms of the peptide backbone. Atopposite corners of this amide plane are located the Cα atoms. Sincethere is substantially no rotation about the O═C and C—N atoms in thepeptide or amide plane, a polypeptide chain thus comprises a series ofplanar peptide linkages joining the Cα atoms.

A second factor that plays an important role in defining the totalstructure or conformation of a polypeptide or protein is the angle ofrotation of each amide plane about the common Cα linkage. The terms“angle of rotation” and “torsion angle” are hereinafter regarded asequivalent terms. Assuming that the O, C, N, and H atoms remain in theamide plane (which is usually a valid assumption, although there may besome slight deviations from planarity of these atoms for someconformations), these angles of rotation define the N and Rpolypeptide's backbone conformation, i.e., the structure as it existsbetween adjacent residues. These two angles are known as φ and ψ. A setof the angles φ₁, ψ₁, where the subscript i represents a particularresidue of a polypeptide chain, thus effectively defines the polypeptidesecondary structure. The conventions used in defining the φ, ψ angles,i.e., the reference points at which the amide planes form a zero degreeangle, and the definition of which angle is φ, and which angle is ψ, fora given polypeptide, are defined in the literature. See, e.g,Ramachandran et al. Adv. Prot. Chem. 23:283-437 (1968), at pages 285-94,which pages are incorporated herein by reference. The present method canbe applied to any protein, and is based in part upon the discovery thatin humans the primary Pocket 1 anchor position of MHC Class II moleculebinding grooves has a well designed specificity for particular aminoacid side chains. The specificity of this pocket is determined by theidentity of the amino acid at position 86 of the beta chain of the MHCClass II molecule. This site is located at the bottom of Pocket 1 anddetermines the size of the side chain that can be accommodated by thispocket. Marshall, K. W., J. Immunol., 152:4946-4956 (1994). If thisresidue is a glycine, then all hydrophobic aliphatic and aromatic aminoacids (hydrophobic aliphatics being: valine, leucine, isoleucine,methionine and aromatics being: phenylalanine, tyrosine and tryptophan)can be accommodated in the pocket, a preference being for the aromaticside chains. If this pocket residue is a valine, then the side chain ofthis amino acid protrudes into the pocket and restricts the size ofpeptide side chains that can be accommodated such that only hydrophobicaliphatic side chains can be accommodated. Therefore, in an amino acidresidue sequence, wherever an amino acid with a hydrophobic aliphatic oraromatic side chain is found, there is the potential for a MHC Class IIrestricted T-cell epitope to be present. If the side-chain ishydrophobic aliphatic, however, it is approximately twice as likely tobe associated with a T-cell epitope than an aromatic side chain(assuming an approximately even distribution of Pocket 1 typesthroughout the global population).

A computational method embodying the present invention profiles thelikelihood of peptide regions to contain T-cell epitopes as follows:

(1) The primary sequence of a peptide segment of predetermined length isscanned, and all hydrophobic aliphatic and aromatic side chains presentare identified. (2) The hydrophobic aliphatic side chains are assigned avalue greater than that for the aromatic side chains; preferably abouttwice the value assigned to the aromatic side chains, e.g., a value of 2for a hydrophobic aliphatic side chain and a value of 1 for an aromaticside chain. (3) The values determined to be present are summed for eachoverlapping amino acid residue segment (window) of predetermined uniformlength within the peptide, and the total value for a particular segment(window) is assigned to a single amino acid residue at an intermediateposition of the segment (window), preferably to a residue at about themidpoint of the sampled segment (window). This procedure is repeated foreach sampled overlapping amino acid residue segment (window). Thus, eachamino acid residue of the peptide is assigned a value that relates tothe likelihood of a T-cell epitope being present in that particularsegment (window). (4) The values calculated and assigned as described inStep 3, above, can be plotted against the amino acid coordinates of theentire amino acid residue sequence being assessed. (5) All portions ofthe sequence which have a score of a predetermined value, e.g., a valueof 1, are deemed likely to contain a T-cell epitope and can be modified,if desired. This particular aspect of the present invention provides ageneral method by which the regions of peptides likely to contain T-cellepitopes can be described. Modifications to the peptide in these regionshave the potential to modify the MHC Class II binding characteristics.

According to another aspect of the present invention, T-cell epitopescan be predicted with greater accuracy by the use of a moresophisticated computational method which takes into account theinteractions of peptides with models of MHC Class II alleles. Thecomputational prediction of T-cell epitopes present within a peptideaccording to this particular aspect contemplates the construction ofmodels of at least 42 MHC Class II alleles based upon the structures ofall known MHC Class II molecules and a method for the use of thesemodels in the computational identification of T-cell epitopes, theconstruction of libraries of peptide backbones for each model in orderto allow for the known variability in relative peptide backbone alphacarbon (Cα) positions, the construction of libraries of amino-acid sidechain conformations for each backbone dock with each model for each ofthe 20 amino-acid alternatives at positions critical for the interactionbetween peptide and MHC Class II molecule, and the use of theselibraries of backbones and side-chain conformations in conjunction witha scoring function to select the optimum backbone and side-chainconformation for a particular peptide docked with a particular MHC ClassII molecule and the derivation of a binding score from this interaction.

Models of MHC Class II molecules can be derived via homology modelingfrom a number of similar structures found in the Brookhaven Protein DataBank (“PDB”). These may be made by the use of semi-automatic homologymodeling software (Modeller, Sali A. & Blundell T L., 1993. J. Mol Biol234:779-815) which incorporates a simulated annealing function, inconjunction with the CHARMm force-field for energy minimisation(available from Molecular Simulations Inc., San Diego, Calif.).Alternative modeling methods can be utilized as well.

The present method differs significantly from other computationalmethods which use libraries of experimentally derived binding data ofeach amino-acid alternative at each position in the binding groove for asmall set of MHC Class II molecules (Marshall, K. W., et al., Biomed.Pept. Proteins Nucleic Acids, 1(3):157-162) (1995) or yet othercomputational methods which use similar experimental binding data inorder to define the binding characteristics of particular types ofbinding pockets within the groove, again using a relatively small subsetof MHC Class II molecules, and then ‘mixing and matching’ pocket typesfrom this pocket library to artificially create further ‘virtual’ MHCClass II molecules (Sturniolo T., et al., Nat. Biotech, 17(6): 555-561(1999). Both prior methods suffer the major disadvantage that, due tothe complexity of the assays and the need to synthesize large numbers ofpeptide variants, only a small number of MHC Class II molecules can beexperimentally scanned. Therefore the first prior method can only makepredictions for a small number of MHC Class II molecules. The secondprior method also makes the assumption that a pocket lined with similaramino-acids in one molecule will have the same binding characteristicswhen in the context of a different Class II allele and suffers furtherdisadvantages in that only those MHC Class II molecules can be‘virtually’ created which contain pockets contained within the pocketlibrary. Using the modeling approach described herein, the structure ofany number and type of MHC Class II molecules can be deduced, thereforealleles can be specifically selected to be representative of the globalpopulation. In addition, the number of MHC Class II molecules scannedcan be increased by making further models further than having togenerate additional data via complex experimentation.

The use of a backbone library allows for variation in the positions ofthe Cα atoms of the various peptides being scanned when docked withparticular MHC Class II molecules. This is again in contrast to thealternative prior computational methods described above which rely onthe use of simplified peptide backbones for scanning amino-acid bindingin particular pockets. These simplified backbones are not likely to berepresentative of backbone conformations found in ‘real’ peptidesleading to inaccuracies in prediction of peptide binding. The presentbackbone library is created by superposing the backbones of all peptidesbound to MHC Class II molecules found within the Protein Data Bank andnoting the root mean square (RMS) deviation between the Cα atoms of eachof the eleven amino-acids located within the binding groove. While thislibrary can be derived from a small number of suitable available mouseand human structures (currently 13), in order to allow for thepossibility of even greater variability, the RMS figure for each C″-□position is increased by 50%. The average Cα position of each amino-acidis then determined and a sphere drawn around this point whose radiusequals the RMS deviation at that position plus 50%. This sphererepresents all allowed Cα positions. Working from the Cα with the leastRMS deviation (that of the amino-acid in Pocket 1 as mentioned above,equivalent to Position 2 of the 11 residues in the binding groove), thesphere is three-dimensionally gridded, and each vertex within the gridis then used as a possible location for a Cα of that amino-acid. Thesubsequent amide plane, corresponding to the peptide bond to thesubsequent amino-acid is grafted onto each of these Cαs and the φ and ψangles are rotated step-wise at set intervals in order to position thesubsequent Cα. If the subsequent Cα falls within the ‘sphere of allowedpositions’ for this Cα than the orientation of the dipeptide isaccepted, whereas if it falls outside the sphere then the dipeptide isrejected. This process is then repeated for each of the subsequent Cαpositions, such that the peptide grows from the Pocket 1 Cα ‘seed’,until all nine subsequent Cαs have been positioned from all possiblepermutations of the preceding Cαs. The process is then repeated oncemore for the single Cα preceding pocket 1 to create a library ofbackbone Cα positions located within the binding groove. The number ofbackbones generated is dependent upon several factors: The size of the‘spheres of allowed positions’; the fineness of the gridding of the‘primary sphere’ at the Pocket 1 position; the fineness of the step-wiserotation of the φ and ψ angles used to position subsequent Cαs. Usingthis process, a large library of backbones can be created. The largerthe backbone library, the more likely it will be that the optimum fitwill be found for a particular peptide within the binding groove of anMHC Class II molecule. Inasmuch as all backbones will not be suitablefor docking with all the models of MHC Class II molecules due to clasheswith amino-acids of the binding domains, for each allele a subset of thelibrary is created comprising backbones which can be accommodated bythat allele. The use of the backbone library, in conjunction with themodels of MHC Class II molecules creates an exhaustive databaseconsisting of allowed side chain conformations for each amino-acid ineach position of the binding groove for each MHC Class II moleculedocked with each allowed backbone. This data set is generated using asimple steric overlap function where a MHC Class II molecule is dockedwith a backbone and an amino-acid side chain is grafted onto thebackbone at the desired position. Each of the rotatable bonds of theside chain is rotated step-wise at set intervals and the resultantpositions of the atoms dependent upon that bond noted. The interactionof the atom with atoms of side-chains of the binding groove is noted andpositions are either accepted or rejected according to the followingcriteria: The sum total of the overlap of all atoms so far positionedmust not exceed a pre-determined value. Thus the stringency of theconformational search is a function of the interval used in thestep-wise rotation of the bond and the pre-determined limit for thetotal overlap. This latter value can be small if it is known that aparticular pocket is rigid, however the stringency can be relaxed if thepositions of pocket side-chains are known to be relatively flexible.Thus allowances can be made to imitate variations in flexibility withinpockets of the binding groove. This conformational search is thenrepeated for every amino-acid at every position of each backbone whendocked with each of the MHC Class II molecules to create the exhaustivedatabase of side-chain conformations.

A suitable mathematical expression is used to estimate the energy ofbinding between models of MHC Class II molecules in conjunction withpeptide ligand conformations which have to be empirically derived byscanning the large database of backbone/side-chain conformationsdescribed above. Thus a protein is scanned for potential T-cell epitopesby subjecting each possible peptide of length varying between 9 and 20amino-acids (although the length is kept constant for each scan) to thefollowing computations: An MHC Class II molecule is selected togetherwith a peptide backbone allowed for that molecule and the side-chainscorresponding to the desired peptide sequence are grafted on. Atomidentity and interatomic distance data relating to a particularside-chain at a particular position on the backbone are collected foreach allowed conformation of that amino-acid (obtained from the databasedescribed above). This is repeated for each side-chain along thebackbone and peptide scores derived using a scoring function. The bestscore for that backbone is retained and the process repeated for eachallowed backbone for the selected model. The scores from all allowedbackbones are compared and the highest score is deemed to be the peptidescore for the desired peptide in that MHC Class II model. This processis then repeated for each model with every possible peptide derived fromthe protein being scanned, and the scores for peptides versus models aredisplayed.

In the context of the present invention, each ligand presented for thebinding affinity calculation is an amino-acid segment selected from apeptide or protein as discussed above. Thus, the ligand is a selectedstretch of amino acids about 9 to 20 amino acids in length derived froma peptide, polypeptide or protein of known sequence. The terms “aminoacids” and “residues” are hereinafter regarded as equivalent terms. Theligand, in the form of the consecutive amino acids of the peptide to beexamined grafted onto a backbone from the backbone library, ispositioned in the binding cleft of an MHC Class II molecule from the MHCClass II molecule model library via the coordinates of the C″-□atoms ofthe peptide backbone and an allowed conformation for each side-chain isselected from the database of allowed conformations. The relevant atomidentities and interatomic distances are also retrieved from thisdatabase and used to calculate the peptide binding score. Ligands with ahigh binding affinity for the MHC Class II binding pocket are flagged ascandidates for site-directed mutagenesis. Amino-acid substitutions aremade in the flagged ligand (and hence in the protein of interest) whichis then retested using the scoring function in order to determinechanges which reduce the binding affinity below a predeterminedthreshold value. These changes can then be incorporated into the proteinof interest to remove T-cell epitopes. Binding between the peptideligand and the binding groove of MHC Class II molecules involvesnon-covalent interactions including, but not limited to: hydrogen bonds,electrostatic interactions, hydrophobic (lipophilic) interactions andVan der Walls interactions. These are included in the peptide scoringfunction as described in detail below. It should be understood that ahydrogen bond is a non-covalent bond which can be formed between polaror charged groups and consists of a hydrogen atom shared by two otheratoms. The hydrogen of the hydrogen donor has a positive charge wherethe hydrogen acceptor has a partial negative charge. For the purposes ofpeptide/protein interactions, hydrogen bond donors may be eithernitrogens with hydrogen attached or hydrogens attached to oxygen ornitrogen. Hydrogen bond acceptor atoms may be oxygens not attached tohydrogen, nitrogens with no hydrogens attached and one or twoconnections, or sulphurs with only one connection. Certain atoms, suchas oxygens attached to hydrogens or imine nitrogens (e.g. C═NH) may beboth hydrogen acceptors or donors. Hydrogen bond energies range from 3to 7 Kcal/mol and are much stronger than Van der Waal's bonds, butweaker than covalent bonds. Hydrogen bonds are also highly directionaland are at their strongest when the donor atom, hydrogen atom andacceptor atom are co-linear. Electrostatic bonds are formed betweenoppositely charged ion pairs and the strength of the interaction isinversely proportional to the square of the distance between the atomsaccording to Coulomb's law. The optimal distance between ion pairs isabout 2.8 Å. In protein/peptide interactions, electrostatic bonds may beformed between arginine, histidine or lysine and aspartate or glutamate.The strength of the bond will depend upon the pKa of the ionizing groupand the dielectric constant of the medium although they areapproximately similar in strength to hydrogen bonds. Lipophilicinteractions are favorable hydrophobic-hydrophobic contacts that occurbetween he protein and peptide ligand. Usually, these will occur betweenhydrophobic amino acid side chains of the peptide buried within thepockets of the binding groove such that they are not exposed to solvent.Exposure of the hydrophobic residues to solvent is highly unfavorablesince the surrounding solvent molecules are forced to hydrogen bond witheach other forming cage-like clathrate structures. The resultantdecrease in entropy is highly unfavorable. Lipophilic atoms may besulphurs which are neither polar nor hydrogen acceptors and carbon atomswhich are not polar. Van der Waal's bonds are non-specific forces foundbetween atoms which are 3-4 Å apart. They are weaker and less specificthan hydrogen and electrostatic bonds. The distribution of electroniccharge around an atom changes with time and, at any instant, the chargedistribution is not symmetric. This transient asymmetry in electroniccharge induces a similar asymmetry in neighboring atoms. The resultantattractive forces between atoms reaches a maximum at the Van der Waal'scontact distance but diminishes very rapidly at about 1 Å to about 2 Å.Conversely, as atoms become separated by less than the contact distance,increasingly strong repulsive forces become dominant as the outerelectron clouds of the atoms overlap. Although the attractive forces arerelatively weak compared to electrostatic and hydrogen bonds (about 0.6Kcal/mol), the repulsive forces in particular may be very important indetermining whether a peptide ligand may bind successfully to a protein.

In one embodiment, the Böhm scoring function (SCORE1 approach) is usedto estimate the binding constant. (Böhm, H. J., J. Comput Aided Mol.Des., 8(3):243-256 (1994) which is hereby incorporated in its entirety).In another embodiment, the scoring function (SCORE2 approach) is used toestimate the binding affinities as an indicator of a ligand containing aT-cell epitope (Böhm, H. J., J. Comput Aided Mol. Des., 12(4):309-323(1998) which is hereby incorporated in its entirety). However, the Böhmscoring functions as described in the above references are used toestimate the binding affinity of a ligand to a protein where it isalready known that the ligand successfully binds to the protein and theprotein/ligand complex has had its structure solved, the solvedstructure being present in the Protein Data Bank (“PDB”). Therefore, thescoring function has been developed with the benefit of known positivebinding data. In order to allow for discrimination between positive andnegative binders, a repulsion term must be added to the equation. Inaddition, a more satisfactory estimate of binding energy is achieved bycomputing the lipophilic interactions in a pairwise manner rather thanusing the area based energy term of the above Böhm functions. Therefore,in a preferred embodiment, the binding energy is estimated using amodified Böhm scoring function. In the modified Böhm scoring function,the binding energy between protein and ligand (ΔG_(bind)) is estimatedconsidering the following parameters: The reduction of binding energydue to the overall loss of translational and rotational entropy of theligand (ΔG₀); contributions from ideal hydrogen bonds (ΔG_(hb)) where atleast one partner is neutral; contributions from unperturbed ionicinteractions (ΔG_(ionic)); lipophilic interactions between lipophilicligand atoms and lipophilic acceptor atoms (ΔG_(lipo)); the loss ofbinding energy due to the freezing of internal degrees of freedom in theligand, i.e., the freedom of rotation about each C—C bond is reduced(ΔG_(rot)); the energy of the interaction between the protein and ligand(E_(VdW)). Consideration of these terms gives equation 1:(ΔG _(bind))=(ΔG ₀)+(ΔG _(hb) ×N _(hb))+(ΔG _(ionic) ×N _(ionic))+(ΔG_(lipo) ×N _(lipo))+(ΔG _(rot) +N _(rot))+(E _(VdW)).Where N is the number of qualifying interactions for a specific termand, in one embodiment, ΔG₀, ΔG_(hb), ΔG_(ionic), ΔG_(lipo) and ΔG_(rot)are constants which are given the values: 5.4, −4.7, −4.7, −0.17, and1.4, respectively.

The term N_(hb) is calculated according to equation 2:N _(hb)=Σ_(h-bonds) f(ΔR, Δα)×f(N _(neighb))×f _(pcs)f(ΔR, Δα) is a penalty function which accounts for large deviations ofhydrogen bonds from ideality and is calculated according to equation 3:

f(Δ R, Δ − •) = f1(Δ R) × f2(Δ α) ${Where}\text{:}\;\begin{matrix}{{{f1}( {\Delta\; R} )} = {{1\mspace{14mu}{if}\mspace{14mu}\Delta\; R}<={TOL}}} \\{{or}\mspace{25mu} = {{1 - {{( {{\Delta\; R} - {TOL}} )/0.4}\mspace{14mu}{if}\mspace{14mu}\Delta\; R}}<={0.4 + {TOL}}}} \\{{or}\mspace{25mu} = {{0\mspace{20mu}{if}\mspace{14mu}\Delta\; R} > {0.4 + {TOL}}}}\end{matrix}$ ${And}\text{:}\mspace{31mu}\begin{matrix}{{{f2}( {\Delta\;\alpha} )} = {{1\mspace{14mu}{if}\mspace{20mu}\Delta\;\alpha} < 30^{{^\circ}}}} \\{{or}\mspace{25mu} = {{1 - {{( {{\Delta\;\alpha} - 30} )/50}\mspace{20mu}{if}\mspace{20mu}\Delta\;\alpha}}<=80^{{^\circ}}}} \\{{or}\mspace{25mu} = {{0\mspace{20mu}{if}\mspace{20mu}\Delta\;\alpha} > 80^{{^\circ}}}}\end{matrix}$

-   TOL is the tolerated deviation in hydrogen bond length=0.25 Å-   ΔR is the deviation of the H—O/N hydrogen bond length from the ideal    value=1.9 Å-   Δα is the deviation of the hydrogen bond angle ∠_(N/O—H..O/N) from    its idealized value of 180°-   f(N_(neighb)) distinguishes between concave and convex parts of a    protein surface and therefore assigns greater weight to polar    interactions found in pockets rather than those found at the protein    surface. This function is calculated according to equation 4 below:    f(N _(neighb))=(N _(neighb) /N _(neighb,0))^(α)    where α=0.5-   N_(neighb) is the number of non-hydrogen protein atoms that are    closer than 5 Å to any given protein atom.-   N_(neighb,0) is a constant=25-   f_(pcs) is a function which allows for the polar contact surface    area per hydrogen bond and therefore distinguishes between strong    and weak hydrogen bonds and its value is determined according to the    following criteria:-   f_(pcs)=β when A_(polar)/N_(HB)<10 Å²-   or f_(pcs)=1 when A_(polar)/N_(HB)>10 Å²-   A_(polar) is the size of the polar protein-ligand contact surface-   N_(HB) is the number of hydrogen bonds-   β is a constant whose value=1.2

For the implementation of the modified Böhm scoring function, thecontributions from ionic interactions, ΔG_(inonic), are computed in asimilar fashion to those from hydrogen bonds described above since thesame geometry dependency is assumed.

The term N_(lipo) is calculated according to equation 5 below:N _(lipo=Σ) _(1L) f(r _(lL))f(r_(lL)) is calculated for all lipophilic ligand atoms, l, and alllipophilic protein atoms, L, according to the following criteria:

-   f(r_(lL))=1 when r_(lL) <=R1f(r_(lL))=(r_(lL)−R1)/(R2-R1) when    R2<r_(lL)>R1-   f(r_(lL))=0 when r_(lL)>=R2-   Where: R1=r₁ ^(vdw)+r_(L) ^(vdw)+0.5-   and R2=R1+3.0-   and r₁ ^(vdw) is the Van der Waal's radius of atom l-   and r_(L) ^(vdw) is the Van der Waal's radius of atom L

The term N_(rot) is the number of rotable bonds of the amino acid sidechain and is taken to be the number of acyclic sp³-sp³ and sp³-sp²bonds. Rotations of terminal —CH₃ or —NH₃ are not taken into account.

The final term, E_(VdW), is calculated according to equation 6 below:E _(VdW)=ε₁ε₂((r ₁ ^(vdw) +r ₂ ^(vdw))¹² /r ¹²−(r ₁ ^(vdw) +r ₂ ^(vdw))⁶/r ⁶),where:

-   ε₁ and ε₂ are constants dependant upon atom identity    -   r₁ ^(vdw)+r₂ ^(vdw) are the Van der Waal's atomic radii-   r is the distance between a pair of atoms.

With regard to Equation 6, in one embodiment, the constants ε₁ and ε₂are given the atom values: C: 0.245, N: 0.283, O: 0.316, S: 0.316,respectively (i.e. for atoms of Carbon, Nitrogen, Oxygen and Sulphur,respectively). With regards to equations 5 and 6, the Van der Waal'sradii are given the atom values C: 1.85, N: 1.75, O: 1.60, S: 2.00 Å.

It should be understood that all predetermined values and constantsgiven in the equations above are determined within the constraints ofcurrent understandings of protein ligand interactions with particularregard to the type of computation being undertaken herein. Therefore, itis possible that, as this scoring function is refined further, thesevalues and constants may change hence any suitable numerical value whichgives the desired results in terms of estimating the binding energy of aprotein to a ligand may be used and hence fall within the scope of thepresent invention.

As described above, the scoring function is applied to data extractedfrom the database of side-chain conformations, atom identities, andinteratomic distances. For the purposes of the present description, thenumber of MHC Class II molecules included in this database is 42 modelsplus four solved structures. It should be apparent from the abovedescriptions that the modular nature of the construction of thecomputational method of the present invention means that new models cansimply be added and scanned with the peptide backbone library andside-chain conformational search function to create additional data setswhich can be processed by the peptide scoring function as describedabove. This allows for the repertoire of scanned MHC Class II moleculesto easily be increased, or structures and associated data to be replacedif data are available to create more accurate models of the existingalleles.

The present prediction method can be calibrated against a data setcomprising a large number of peptides whose affinity for various MHCClass II molecules has previously been experimentally determined. Bycomparison of calculated versus experimental data, a cut of value can bedetermined above which it is known that all experimentally determinedT-cell epitopes are correctly predicted.

It should be understood that, although the above scoring function isrelatively simple compared to some sophisticated methodologies that areavailable, the calculations are performed extremely rapidly. It shouldalso be understood that the objective is not to calculate the truebinding energy per se for each peptide docked in the binding groove of aselected MHC Class II protein. The underlying objective is to obtaincomparative binding energy data as an aid to predicting the location ofT-cell epitopes based on the primary structure (i.e. amino acidsequence) of a selected protein. A relatively high binding energy or abinding energy above a selected threshold value would suggest thepresence of a T-cell epitope in the ligand. The ligand may then besubjected to at least one round of amino-acid substitution and thebinding energy recalculated. Due to the rapid nature of thecalculations, these manipulations of the peptide sequence can beperformed interactively within the program's user interface oncost-effectively available computer hardware. Major investment incomputer hardware is thus not required.

It would be apparent to one skilled in the art that other availablesoftware could be used for the same purposes. In particular, moresophisticated software which is capable of docking ligands into proteinbinding-sites may be used in conjunction with energy minimization.Examples of docking software are: DOCK (Kuntz et al, J. Mol. Biol.,161:269-288 (1982)), LUDI (Böhm, H. J., J. Comput Aided Mol. Des.,8:623-632 (1994)) and FLEXX (Rarey M., et al, ISMB, 3:300-308 (1995)).Examples of molecular modeling and manipulation software include: AMBER(Tripos) and CHARMm (Molecular Simulations Inc.). The use of thesecomputational methods would severely limit the throughput of the methodof this invention due to the lengths of processing time required to makethe necessary calculations. However, it is feasible that such methodscould be used as a ‘secondary screen’ to obtain more accuratecalculations of binding energy for peptides which are found to be‘positive binders’ via the method of the present invention. Thelimitation of processing time for sophisticated molecular mechanic ormolecular dynamic calculations is one which is defined both by thedesign of the software which makes these calculations and the currenttechnology limitations of computer hardware. It may be anticipated that,in the future, with the writing of more efficient code and thecontinuing increases in speed of computer processors, it may becomefeasible to make such calculations within a more manageable time-frame.Further information on energy functions applied to macromolecules andconsideration of the various interactions that take place within afolded protein structure can be found in: Brooks, B. R., et al., J.Comput. Chem., 4:187-217 (1983) and further information concerninggeneral protein-ligand interactions can be found in: Dauber-Osguthorpeet al., Proteins4(1):31-47(1988), which are incorporated herein byreference in their entirety. Useful background information can also befound, for example, in Fasman, G. D., ed., Prediction of ProteinStructure and the Principles of Protein Conformation, Plenum Press, NewYork, ISBN: 0-306 4313-9.

1. A method of preparing a modified granulocyte colony stimulatingfactor (G-CSF) protein comprising the steps of: (i) identifying one ormore potential T-cell epitopes within the amino acid sequence of humanG-CSF (SEQ ID NO: 1); (ii) designing at least one sequence variant of atleast one potential T-cell epitope identified in step (i), wherein thesequence variant eliminates or substantially reduces the MHC class IIbinding activity of the potential T-cell epitope; (iii) preparing, byrecombinant DNA techniques, at least one modified G-CSF proteinincluding a sequence variant designed in step (ii), the amino acidsequence of the modified G-CSF consisting of SEQ ID NO: 1 with 1 to 9amino acid substitutions or deletions therein or additions thereto,wherein substitutions are selected from the group of amino acidsubstitutions set forth in Table 2 and Table 3; (iv) evaluating at leastone modified G-CSF protein prepared in step (iii) for therapeutic G-CSFbiological activity and immunogenicity; and (v) selecting a modifiedG-CSF protein evaluated in step (iv) that has substantially the sametherapeutic G-CSF biological activity as human G-CFS, but substantiallyless immunogenicity than human G-CSF; wherein step (i) is carried outby: (a) selecting a region of human G-CSF having a known amino acidsequence; (b) sequentially sampling overlapping amino acid residuesegments of predetermined uniform size, and including at least threeamino acid residues, from the selected region; and (c) calculating a MHCclass II binding score for each sequentially sampled amino acid residuesegment by summing assigned values for each hydrophobic amino acidresidue side chain present in each sequentially sampled amino acidresidue segment, and thereby obtaining a calculated MHC class II bindingscore therefor; step (ii) is carried out by: (d) identifying a desiredsegment from among the sequentially sampled amino acid residue segmentsthat is suitable for modification, based on the calculated MHC class IIbinding score therefor; (e) calculating MHC class II binding scores forsequence variants of the desired segment; (f) selecting from saidsequence variants a sequence variant that has a lower MHC class IIbinding score than the MHC class II binding score of the desiredsegment; and step (c) is carried out using a modified Böhm scoringfunction including 12-6 vander Waal's ligand-protien energy repulsiveterms (1) selecting a model from a first database of MHC class IImolecule models; (2) selecting an allowed peptide backbone from a seconddatabase of allowed peptide backbones for the MHC class II moleculemodels in step (1); (3) identifying amino acid residue side chainspresent in each sampled segment; (4) determining a binding affinityvalue for all side chains present in each sampled segment; and (5)repeating each of (1) through (4) for each model in the first databaseand for each backbone in the second database.